Skip to content

Conversation

@Crowiant
Copy link
Contributor

@Crowiant Crowiant commented Apr 28, 2025

Closes: #44994
The problem with failing KubernetesJobOperator when using the parallelism option starts to appear from version 8.4.1, according to #44994.

Reason: As KubernetesJobOperator inherits from KubernetesPodOperator, it also uses some of the parent methods.
One of these, get_or_create_pod, is set to find only one or no pod during execution. In case the method finds two pods, it raises the exception 'More than one pod found'. While this is appropriate for the KubernetesPodOperator logic, KubernetesJobOperator could use more than one pod during execution.
That's why in this PR a new method, get_pods, was added. It will be more suitable for the logic of this operator. Also, a new attribute, self.pods, has been introduced in the operator. This attribute is needed for handling the logic of the do_xcom_push and get_logs flags.
Change KubernetesJobTrigger to handle KubernetesJobOperator with parallelism and deferrable flag.
Change GKEStartJobOperator.execute_deferrable to handle multiple pods.
Change GKEJobTrigger to handle multiple pods.
Adjust system test example_kubernetes_engine_job.py in google provider to reflect changes.
Adjust and add unit tests in providers: cncf/kubernetes, google


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues provider:google Google (including GCP) related issues labels Apr 28, 2025
@VladaZakharova
Copy link
Contributor

hi @potiuk !
can you please check changes here? Thanks!

@VladaZakharova
Copy link
Contributor

hi @shahar1 !
Can you please check changes here?
Thanks :)

@Crowiant Crowiant force-pushed the fix-kjo-with-parallelism branch from 65676cc to 435b41b Compare June 6, 2025 15:10
@Crowiant Crowiant requested a review from shahar1 June 6, 2025 15:10
@Crowiant Crowiant force-pushed the fix-kjo-with-parallelism branch 3 times, most recently from df96d8e to a650584 Compare June 6, 2025 17:30
@shahar1
Copy link
Contributor

shahar1 commented Jun 20, 2025

@Crowiant / @VladaZakharova - could you please refer to @steinwaywhw 's comments? Thank you!

@Crowiant Crowiant force-pushed the fix-kjo-with-parallelism branch 2 times, most recently from 2946155 to 21af9e6 Compare June 21, 2025 18:15
@Crowiant
Copy link
Contributor Author

Crowiant commented Jul 3, 2025

Hello @steinwaywhw can you please review my answers to your comments? Thank you!

@Crowiant Crowiant requested a review from steinwaywhw July 4, 2025 09:36
@berglh
Copy link

berglh commented Jul 7, 2025

@hussein-awala, @shahar1, @jedcunningham can someone with review and write access please review these changes? 🙏

Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay - re-reviewed it, almost there IMO :)
I'll be happy for an additional reviews, preferrably from people who use/maintain the Kubernetes operators.

@berglh
Copy link

berglh commented Jul 14, 2025

@Crowiant could you please address issues raised by shahar1 review when you get a chance? Thank-you :)

@Crowiant Crowiant force-pushed the fix-kjo-with-parallelism branch from 21af9e6 to e5391a9 Compare July 15, 2025 17:07
@Crowiant Crowiant requested a review from shahar1 July 16, 2025 08:04
Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one small comment,
I'd appreciate an additional review before merging.

@Crowiant Crowiant force-pushed the fix-kjo-with-parallelism branch from e5391a9 to 9dc7042 Compare July 22, 2025 09:43
@VladaZakharova
Copy link
Contributor

@shahar1
hi there! do you have some other questions for this change? can we merge it?

@shahar1 shahar1 merged commit e50ce94 into apache:main Jul 24, 2025
83 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KubernetesJobOperator fails if you launch more than one pod

5 participants